[fix][broker] Ignore and remove the replicator cursor when the remote cluster is absent#19972
Conversation
|
@BewareMyPower Please add the following content to your PR description and select a checkbox: |
poorbarcode
left a comment
There was a problem hiding this comment.
ShadowReplicator hits this issue, too. Could you please also fix it?
I think it's another issue. You can open another PR for it. |
… cluster is absent ### Motivation Sometimes when a remote cluster is deleted, the replication cursor might still exist for some topics. In this case, creating producers or consumers on these topics will fail. Here is a log observed in a production environment: > WARN org.apache.pulsar.broker.service.BrokerService - Replication or > dedup check failed. Removing topic from topics list > persistent://public/__kafka/__consumer_offsets-partition-40, > java.util.concurrent.CompletionException: java.lang.RuntimeException: > org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException: > kop If it happened, unloading the topic or restarting the broker could not help. We have to remove the cursor manually. ### Modificatons In `addReplicationCluster`, before getting the replication client, check the namespace policy and topic policy first. If the remote cluster does not exist, skip adding the replication client and remove the cursor. ### Verifications `PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor` is added to verify `PersistentTopic#initialize` will succeed and the zombie replicator cursor will be removed.
1469b85 to
8aeb37d
Compare
|
Hi, @poorbarcode @315157973 Now I checked the namespace and topic policies to determine whether to delete the cursor. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19972 +/- ##
============================================
- Coverage 72.89% 72.85% -0.05%
+ Complexity 31619 31613 -6
============================================
Files 1861 1861
Lines 137356 137375 +19
Branches 15117 15120 +3
============================================
- Hits 100131 100089 -42
- Misses 29271 29329 +58
- Partials 7954 7957 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| .thenCompose(__ -> checkReplicationCluster(remoteCluster)) | ||
| .thenCompose(clusterExists -> { | ||
| if (!clusterExists) { | ||
| log.warn("Remove the replicator because the cluster '{}' does not exist", remoteCluster); | ||
| return removeReplicator(remoteCluster).thenApply(__ -> null); |
There was a problem hiding this comment.
Why don't we clean up when the Policy is removed?
There was a problem hiding this comment.
Sorry I missed this comment when I merged. And yes, we should do that as well. WDYT? @poorbarcode
There was a problem hiding this comment.
Why don't we clean up when the Policy is removed?
When the policy is removed, the cursor will not be clean If this topic is not online. see
|
This PR introduced a new flaky test, reported as #20010 . @BewareMyPower do you have a chance to take a look? thanks |
|
Okay, I assigned that issue to me first. |
| replicators.get(remoteCluster).disconnect().thenRun(() -> { | ||
|
|
||
| Optional.ofNullable(replicators.get(remoteCluster)).map(Replicator::disconnect) | ||
| .orElse(CompletableFuture.completedFuture(null)).thenRun(() -> { |
There was a problem hiding this comment.
I think not. See
public static <U> CompletableFuture<U> completedFuture(U value) {
return new CompletableFuture<U>((value == null) ? NIL : value);
}so completedFuture(null) won't create a new instance.
… cluster is absent (#19972)
… cluster is absent (apache#19972) (cherry picked from commit d1fc732) (cherry picked from commit b93176c)
|
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
…ting logic for better handle the checkpoint. (apache#19972) * Change the initial start cursor and stop cursor to better handle the consuming behaviors. * Create the initial subscription instead seek every time. This should fix the wrong position setting. * Fix the wrong stop cursor, make sure it stops at the correct space * Drop Consumer.seek() for apache#16171
Motivation
Sometimes when a remote cluster is deleted, the replication cursor might still exist for some topics. In this case, creating producers or consumers on these topics will fail.
Here is a log observed in a production environment:
If it happened, unloading the topic or restarting the broker could not help. We have to remove the cursor manually.
Modifications
In
addReplicationCluster, before getting the replication client, check the namespace policy and topic policy first. If the remote cluster does not exist, skip adding the replication client and remove the cursor.Verifications
PersistentTopicTest#testCreateTopicWithZombieReplicatorCursoris added to verifyPersistentTopic#initializewill succeed and the zombie replicator cursor will be removed.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: BewareMyPower#23